{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "lovely-birthday",
   "metadata": {},
   "source": [
    "# Shapley Additive Explanations\n",
    "\n",
    "Link to API Reference: [ShapKernel](./python/api/ShapKernel.ipynb)\n",
    "\n",
    "*See the backing repository for SHAP [here](https://github.com/slundberg/shap).*\n",
    "\n",
    "<h2>Summary</h2>\n",
    "\n",
    "SHAP is a framework that explains the output of any model using Shapley values, a game theoretic approach often used for optimal credit allocation. While this can be used on any blackbox models, SHAP  can compute more efficiently on specific model classes (like tree ensembles). These optimizations become important at scale -- calculating many SHAP values is feasible on optimized model classes, but can be comparatively slow in the model-agnostic setting. Due to their additive nature, individual (local) SHAP values can be aggregated and also used for global explanations. SHAP can be used as a foundation for deeper ML analysis such as model monitoring, fairness and cohort analysis.\n",
    "\n",
    "<h2>How it Works</h2>\n",
    "\n",
    "Christoph Molnar's \"Interpretable Machine Learning\" e-book [[1](molnar2020interpretable_shap)] has an excellent overview on SHAP that can be found [here](https://christophm.github.io/interpretable-ml-book/shap.html).\n",
    "\n",
    "The conceiving paper \"A Unified Approach to Interpreting Model Predictions\" [[2](lundberg2017unified_shap)] can be found on arXiv [here](https://arxiv.org/pdf/1602.04938.pdf).\n",
    "\n",
    "If you find video as a better medium for learning the algorithm, you can find a conceptual overview of the algorithm by the author Scott Lundberg below:\n",
    "[![The Science Behind InterpretML: SHAP](https://img.youtube.com/vi/-taOhqkiuIo/0.jpg)](https://www.youtube.com/watch?v=-taOhqkiuIo \"The Science Behind InterpretML: SHAP\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "overhead-tournament",
   "metadata": {},
   "source": [
    "<h2>Code Example</h2>\n",
    "\n",
    "The following code will train a blackbox pipeline for the breast cancer dataset. Aftewards it will interpret the pipeline and its decisions with SHAP. The visualizations provided will be for local explanations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "personalized-symphony",
   "metadata": {},
   "outputs": [],
   "source": [
    "from interpret import set_visualize_provider\n",
    "from interpret.provider import InlineProvider\n",
    "set_visualize_provider(InlineProvider())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "robust-possible",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.datasets import load_breast_cancer\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.decomposition import PCA\n",
    "from sklearn.pipeline import Pipeline\n",
    "\n",
    "from interpret import show\n",
    "from interpret.blackbox import ShapKernel\n",
    "\n",
    "seed = 42\n",
    "np.random.seed(seed)\n",
    "X, y = load_breast_cancer(return_X_y=True, as_frame=True)\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)\n",
    "\n",
    "pca = PCA()\n",
    "rf = RandomForestClassifier(random_state=seed)\n",
    "\n",
    "blackbox_model = Pipeline([('pca', pca), ('rf', rf)])\n",
    "blackbox_model.fit(X_train, y_train)\n",
    "\n",
    "shap = ShapKernel(blackbox_model, X_train)\n",
    "shap_local = shap.explain_local(X_test[:5], y_test[:5])\n",
    "\n",
    "show(shap_local, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "noble-issue",
   "metadata": {},
   "source": [
    "<h2>Further Resources</h2>\n",
    "\n",
    "- [SHAP GitHub Repository](https://github.com/slundberg/shap)\n",
    "- [Conceptual Video for SHAP](https://www.youtube.com/watch?v=-taOhqkiuIo)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "pleased-premiere",
   "metadata": {},
   "source": [
    "<h2>Bibliography</h2>\n",
    "\n",
    "(molnar2020interpretable_shap)=\n",
    "[1] Christoph Molnar. Interpretable machine learning. Lulu. com, 2020.\n",
    "\n",
    "(lundberg2017unified_shap)=\n",
    "[2] Scott Lundberg and Su-In Lee. A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874, 2017."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
